Pesquisa | Portal Regional da BVS

PWHATSHAP: efficient haplotyping for future generation sequencing.

Bracciali, Andrea; Aldinucci, Marco; Patterson, Murray; Marschall, Tobias; Pisanti, Nadia; Merelli, Ivan; Torquati, Massimo.

BMC Bioinformatics ; 17(Suppl 11): 342, 2016 Sep 22.

Artigo em Inglês | MEDLINE | ID: mdl-28185544

RESUMO

BACKGROUND: Haplotype phasing is an important problem in the analysis of genomics information. Given a set of DNA fragments of an individual, it consists of determining which one of the possible alleles (alternative forms of a gene) each fragment comes from. Haplotype information is relevant to gene regulation, epigenetics, genome-wide association studies, evolutionary and population studies, and the study of mutations. Haplotyping is currently addressed as an optimisation problem aiming at solutions that minimise, for instance, error correction costs, where costs are a measure of the confidence in the accuracy of the information acquired from DNA sequencing. Solutions have typically an exponential computational complexity. WHATSHAP is a recent optimal approach which moves computational complexity from DNA fragment length to fragment overlap, i.e., coverage, and is hence of particular interest when considering sequencing technology's current trends that are producing longer fragments. RESULTS: Given the potential relevance of efficient haplotyping in several analysis pipelines, we have designed and engineered PWHATSHAP, a parallel, high-performance version of WHATSHAP. PWHATSHAP is embedded in a toolkit developed in Python and supports genomics datasets in standard file formats. Building on WHATSHAP, PWHATSHAP exhibits the same complexity exploring a number of possible solutions which is exponential in the coverage of the dataset. The parallel implementation on multi-core architectures allows for a relevant reduction of the execution time for haplotyping, while the provided results enjoy the same high accuracy as that provided by WHATSHAP, which increases with coverage. CONCLUSIONS: Due to its structure and management of the large datasets, the parallelisation of WHATSHAP posed demanding technical challenges, which have been addressed exploiting a high-level parallel programming framework. The result, PWHATSHAP, is a freely available toolkit that improves the efficiency of the analysis of genomics information.

Assuntos

Algoritmos , Biologia Computacional/métodos , Genoma Humano , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Genética Populacional , Genômica/métodos , Humanos

Sequence alignment tools: one parallel pattern to rule them all?

Misale, Claudia; Ferrero, Giulio; Torquati, Massimo; Aldinucci, Marco.

Biomed Res Int ; 2014: 539410, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25147803

RESUMO

In this paper, we advocate high-level programming methodology for next generation sequencers (NGS) alignment tools for both productivity and absolute performance. We analyse the problem of parallel alignment and review the parallelisation strategies of the most popular alignment tools, which can all be abstracted to a single parallel paradigm. We compare these tools to their porting onto the FastFlow pattern-based programming framework, which provides programmers with high-level parallel patterns. By using a high-level approach, programmers are liberated from all complex aspects of parallel programming, such as synchronisation protocols, and task scheduling, gaining more possibility for seamless performance tuning. In this work, we show some use cases in which, by using a high-level approach for parallelising NGS tools, it is possible to obtain comparable or even better absolute performance for all used datasets.

Assuntos

Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Software

On designing multicore-aware simulators for systems biology endowed with OnLine statistics.

Aldinucci, Marco; Calcagno, Cristina; Coppo, Mario; Damiani, Ferruccio; Drocco, Maurizio; Sciacca, Eva; Spinella, Salvatore; Torquati, Massimo; Troina, Angelo.

Biomed Res Int ; 2014: 207041, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-25050327

RESUMO

The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed.

Assuntos

Simulação por Computador , Sistemas On-Line/instrumentação , Design de Software , Estatística como Assunto , Biologia de Sistemas/instrumentação , Bacteriófago lambda/fisiologia , Citosol/metabolismo , Proteínas Fúngicas/metabolismo , Modelos Biológicos , Neurospora/metabolismo , Interface Usuário-Computador

Parallel stochastic systems biology in the cloud.

Aldinucci, Marco; Torquati, Massimo; Spampinato, Concetto; Drocco, Maurizio; Misale, Claudia; Calcagno, Cristina; Coppo, Mario.

Brief Bioinform ; 15(5): 798-813, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23780997

RESUMO

The stochastic modelling of biological systems, coupled with Monte Carlo simulation of models, is an increasingly popular technique in bioinformatics. The simulation-analysis workflow may result computationally expensive reducing the interactivity required in the model tuning. In this work, we advocate the high-level software design as a vehicle for building efficient and portable parallel simulators for the cloud. In particular, the Calculus of Wrapped Components (CWC) simulator for systems biology, which is designed according to the FastFlow pattern-based approach, is presented and discussed. Thanks to the FastFlow framework, the CWC simulator is designed as a high-level workflow that can simulate CWC models, merge simulation results and statistically analyse them in a single parallel workflow in the cloud. To improve interactivity, successive phases are pipelined in such a way that the workflow begins to output a stream of analysis results immediately after simulation is started. Performance and effectiveness of the CWC simulator are validated on the Amazon Elastic Compute Cloud.

Assuntos

Armazenamento e Recuperação da Informação , Processos Estocásticos , Biologia de Sistemas , Biologia Computacional , Simulação por Computador

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA